181,493 research outputs found

    Embedding Robotic Agents in the Social Environment

    Get PDF
    This paper discusses the interactive vision approach, which advocates using knowledge from the human sciences on the structure and dynamics of human-human interaction in the development of machine vision systems and interactive robots. While this approach is discussed generally, the particular case of the system being developed for the Aurora project (which aims to produce a robot to be used as a tool in the therapy of children with autism) is especially considered, with description of the design of the machine vision system being employed and discussion of ideas from the human sciences with particular reference to the Aurora system. An example architecture for a simple interactive agent, which will likely form the basis for the first implementation of this system, is briefly described and a description of hardware used for the Aurora system is given.Peer reviewe

    Introduction to the special issue on Machine learning for multiple modalities in interactive systems and robots

    Get PDF
    This special issue highlights research articles that apply machine learning to robots and other systems that interact with users through more than one modality, such as speech, gestures, and vision. For example, a robot may coordinate its speech with its actions, taking into account (audio-)visual feedback during their execution. Machine learning provides interactive systems with opportunities to improve performance not only of individual components but also of the system as a whole. However, machine learning methods that encompass multiple modalities of an interactive system are still relatively hard to find. The articles in this special issue represent examples that contribute to filling this gap

    Machine Learning for Interactive Systems: Challenges and Future Trends

    Get PDF
    National audienceMachine learning has been introduced more than 40 years ago in interactive systems through speech recognition or computer vision. Since that, machine learning gained in interest in the scientific community involved in human- machine interaction and raised in the abstraction scale. It moved from fundamental signal processing to language understanding and generation, emotion and mood recogni- tion and even dialogue management or robotics control. So far, existing machine learning techniques have often been considered as a solution to some problems raised by inter- active systems. Yet, interaction is also the source of new challenges for machine learning and offers new interesting practical but also theoretical problems to solve. In this paper, we address these challenges and describe why research in machine learning and interactive systems should converge in the future

    Vision systems with the human in the loop

    Get PDF
    The emerging cognitive vision paradigm deals with vision systems that apply machine learning and automatic reasoning in order to learn from what they perceive. Cognitive vision systems can rate the relevance and consistency of newly acquired knowledge, they can adapt to their environment and thus will exhibit high robustness. This contribution presents vision systems that aim at flexibility and robustness. One is tailored for content-based image retrieval, the others are cognitive vision systems that constitute prototypes of visual active memories which evaluate, gather, and integrate contextual knowledge for visual analysis. All three systems are designed to interact with human users. After we will have discussed adaptive content-based image retrieval and object and action recognition in an office environment, the issue of assessing cognitive systems will be raised. Experiences from psychologically evaluated human-machine interactions will be reported and the promising potential of psychologically-based usability experiments will be stressed

    An augmented reality platform for interactive aerodynamic design and analysis

    Get PDF
    While modern CFD tools are able to provide the user with reliable and accurate simulations, there is a strong need for interactive design and analysis tools. State-of-the-art CFD software employs massive resources in terms of CPU time, user interaction, and also GPU time for rendering and analysis. In this work, we develop an innovative tool able to provide a seamless bridge between artistic design and engineering analysis. This platform has three main ingredients: computer vision to avoid long user interaction at the pre-processing stage, machine learning to avoid costly CFD simulations, and augmented reality for an agile and interactive post-processing of the results

    A Neural, Interactive-predictive System for Multimodal Sequence to Sequence Tasks

    Full text link
    We present a demonstration of a neural interactive-predictive system for tackling multimodal sequence to sequence tasks. The system generates text predictions to different sequence to sequence tasks: machine translation, image and video captioning. These predictions are revised by a human agent, who introduces corrections in the form of characters. The system reacts to each correction, providing alternative hypotheses, compelling with the feedback provided by the user. The final objective is to reduce the human effort required during this correction process. This system is implemented following a client-server architecture. For accessing the system, we developed a website, which communicates with the neural model, hosted in a local server. From this website, the different tasks can be tackled following the interactive-predictive framework. We open-source all the code developed for building this system. The demonstration in hosted in http://casmacat.prhlt.upv.es/interactive-seq2seq.Comment: ACL 2019 - System demonstration

    Online learning and detection of faces with low human supervision

    Get PDF
    The final publication is available at link.springer.comWe present an efficient,online,and interactive approach for computing a classifier, called Wild Lady Ferns (WiLFs), for face learning and detection using small human supervision. More precisely, on the one hand, WiLFs combine online boosting and extremely randomized trees (Random Ferns) to compute progressively an efficient and discriminative classifier. On the other hand, WiLFs use an interactive human-machine approach that combines two complementary learning strategies to reduce considerably the degree of human supervision during learning. While the first strategy corresponds to query-by-boosting active learning, that requests human assistance over difficult samples in function of the classifier confidence, the second strategy refers to a memory-based learning which uses ¿ Exemplar-based Nearest Neighbors (¿ENN) to assist automatically the classifier. A pre-trained Convolutional Neural Network (CNN) is used to perform ¿ENN with high-level feature descriptors. The proposed approach is therefore fast (WilFs run in 1 FPS using a code not fully optimized), accurate (we obtain detection rates over 82% in complex datasets), and labor-saving (human assistance percentages of less than 20%). As a byproduct, we demonstrate that WiLFs also perform semi-automatic annotation during learning, as while the classifier is being computed, WiLFs are discovering faces instances in input images which are used subsequently for training online the classifier. The advantages of our approach are demonstrated in synthetic and publicly available databases, showing comparable detection rates as offline approaches that require larger amounts of handmade training data.Peer ReviewedPostprint (author's final draft

    Crowdsourcing in Computer Vision

    Full text link
    Computer vision systems require large amounts of manually annotated data to properly learn challenging visual concepts. Crowdsourcing platforms offer an inexpensive method to capture human knowledge and understanding, for a vast number of visual perception tasks. In this survey, we describe the types of annotations computer vision researchers have collected using crowdsourcing, and how they have ensured that this data is of high quality while annotation effort is minimized. We begin by discussing data collection on both classic (e.g., object recognition) and recent (e.g., visual story-telling) vision tasks. We then summarize key design decisions for creating effective data collection interfaces and workflows, and present strategies for intelligently selecting the most important data instances to annotate. Finally, we conclude with some thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in Computer Graphics and Vision, 201
    corecore